31 research outputs found

    GENCODE: reference annotation for the human and mouse genomes in 2023.

    Get PDF
    GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org

    Transcriptional Analysis of Murine Macrophages Infected with Different Toxoplasma Strains Identifies Novel Regulation of Host Signaling Pathways

    Get PDF
    Most isolates of Toxoplasma from Europe and North America fall into one of three genetically distinct clonal lineages, the type I, II and III lineages. However, in South America these strains are rarely isolated and instead a great variety of other strains are found. T. gondii strains differ widely in a number of phenotypes in mice, such as virulence, persistence, oral infectivity, migratory capacity, induction of cytokine expression and modulation of host gene expression. The outcome of toxoplasmosis in patients is also variable and we hypothesize that, besides host and environmental factors, the genotype of the parasite strain plays a major role. The molecular basis for these differences in pathogenesis, especially in strains other than the clonal lineages, remains largely unexplored. Macrophages play an essential role in the early immune response against T. gondii and are also the cell type preferentially infected in vivo. To determine if non-canonical Toxoplasma strains have unique interactions with the host cell, we infected murine macrophages with 29 different Toxoplasma strains, representing global diversity, and used RNA-sequencing to determine host and parasite transcriptomes. We identified large differences between strains in the expression level of known parasite effectors and large chromosomal structural variation in some strains. We also identified novel strain-specifically regulated host pathways, including the regulation of the type I interferon response by some atypical strains. IFNβ production by infected cells was associated with parasite killing, independent of interferon gamma activation, and dependent on endosomal Toll-like receptors in macrophages and the cytoplasmic receptor retinoic acid-inducible gene 1 (RIG-I) in fibroblasts.National Institutes of Health (U.S.) (R01-AI080621)New England Regional Center of Excellence for Biodefense and Emerging Infectious Diseases (Developmental Grant AIO57159)Pew Charitable Trusts (Biomedical Scholars Program)Robert A. Swanson Career Development awardThe Knights Templar Eye Foundation, Inc.Pre-Doctoral Grant in the Biological Sciences (5-T32-GM007287-33)Cleo and Paul Schimmel Foundatio

    Inférence de requêtes régulières dans les arbres et applications à l'extraction d'information sur le Web

    No full text
    Cette thèse se place dans le cadre de l'inférence de programmes d'extraction d'information à partir du Web. Elle soutiens les deux idées suivantes: - l'ultilisation de la structure arborescente des documents du Web permet de définir des programmes d'extraction expressifs et efficaces; - les techniques d'inférences grammaticale sur les arbres sont bien adaptées pour l'inférences de programmes d'extraction d'information

    Machine Learning manuscript No. (will be inserted by the editor) Interactive Learning of Node Selecting Tree

    Get PDF
    inference, tree automata, monadic queries. Abstract We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right expressiveness for Web information extraction, and prove that monadic queries defined by NSTTs can be answered efficiently. We present a new polynomial time algorithm in RPNI-style that learns monadic queries defined by deterministic NSTTs from completely annotated examples, where all selected nodes are distinguished. In practice, users prefer to provide partial annotations. We propose t

    WEB WRAPPER SPECIFICATION USING COMPOUND FILTER LEARNING

    No full text
    Information available on the Internet is made to be read by humans, not to be processed by machines. To automatically access this information, there is a need for intelligent services that convert HTML documents into more suitable formats like XML. This can be achieved through generation of Web wrappers, programs designed to process pages of a given Web site. To generate such Web wrappers, an efficient approach is to learn them from examples provided by the user. We present such a system, which is based on the generation, selection and combination of elementary extraction operators that we call filters. What makes this approach innovative is that generated wrappers can be easily read, interpreted and modified by the user
    corecore